Recommending Library Methods: An Evaluation of the Vector Space Model (VSM) and Latent Semantic Indexing (LSI)

نویسندگان

  • Frank McCarey
  • Mel Ó Cinnéide
  • Nicholas Kushmerick
چکیده

The development and maintenance of a reuse repository requires significant investment, planning and managerial support. To minimise risk and ensure a healthy return on investment, reusable components should be accessible, reliable and of a high quality. In this paper we concentrate on accessability; we describe a technique which enables a developer to effectively and conveniently make use of large scale libraries. Unlike most previous solutions to component retrieval, our tool, RASCAL, is a proactive component recommender. RASCAL recommends a set of task-relevant reusable components to a developer. Recommendations are produced using Collaborative Filtering (CF). We compare and contrast CF effectiveness when using two information retrieval techniques, namely Vector Space Model (VSM) and Latent Semantic Indexing (LSI). We validate our technique on real world examples and find overall results are encouraging; notably, RASCAL can produce reasonably good recommendations when they are most valuable i.e., at an early stage in code development.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Performance of Latent Semantic Indexing based Information Retrieval

Conventional vector-based Information Retrieval (IR) models: Vector Space Model (VSM) and Generalized Vector Space Model (GVSM) represents documents and queries as vectors in a multidimensional space. This high dimensional data places great demands on computing resources. To overcome these problems, Latent Semantic Indexing (LSI), a variant of VSM, projects the documents into a lower dimensiona...

متن کامل

Analysis of a Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for Information Retrieval

Latent Semantic Indexing (LSI), a variant of classical Vector Space Model (VSM), is an Information Retrieval (IR) model that attempts to capture the latent semantic relationship between the data items. Mathematical lattices, under the framework of Formal Concept Analysis (FCA), represent conceptual hierarchies in data and retrieve the information. However, both LSI and FCA use the data represen...

متن کامل

Latent Semantic Indexing for Patent Documents

Since the huge database of patent documents is continuously increasing, the issue of classifying, updating and retrieving patent documents turned into an acute necessity. Therefore, we investigate the efficiency of applying Latent Semantic Indexing, an automatic indexing method of information retrieval, to some classes of patent documents from the United States Patent Classification System. We ...

متن کامل

Latent Semantic Indexing Based on Factor Analysis

The main purpose of this paper is to propose a novel latent semantic indexing (LSI), statistical approach to simultaneously mapping documents and terms into a latent semantic space. This approach can index documents more effectively than the vector space model (VSM). Latent semantic indexing (LSI), which is based on singular value decomposition (SVD), and probabilistic latent semantic indexing ...

متن کامل

Approximate Dimension Equalization in Vector-based Information Retrieval

Vector-based information retrieval methods such as the vector space model (VSM), latent semantic indexing (LSI), and the generalized vector space model (GVSM) represent both queries and documents by high-dimensional vectors learned from analyzing a training corpus of text. VSM scales well to large collections, but cannot represent term–term correlations, which prevents it from being used in tra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006